Fault Tolerance in Hadoop for Work Migration
نویسنده
چکیده
2. Introduction Hadoop [1] is an open-source software framework implemented using Java and is designed to be used on large distributed systems. Hadoop is a project of the Apache Software Foundation and is a very popular software tool due, in part, to it being opensource. Yahoo! Has contributed to about 80% of the main core of Hadoop [3], but many other large technology organizations have used or are currently using Hadoop, such as, Facebook, Twitter, LinkedIn, and others [3]. The Hadoop framework is comprised of many different projects, but two of the main ones are the Hadoop Distributed File System (HDFS) and MapReduce. HDFS is designed to work with the MapReduce paradigm. This survey paper is focused around HDFS and how it was implemented to be very fault tolerant because fault tolerance is an essential part of modern day distributed systems.
منابع مشابه
Pre-stack Kirchhoff Time Migration on Hadoop and Spark
Pre-stack Kirchhoff time migration (PKTM) is one of the most widely used migration algorithms in seismic imaging area. However, PKTM takes considerable time due to its high computational cost, which greatly affects the working efficiency of oil industry. Due to its high fault tolerance and scalability, Hadoop has become the most popular platform for big data processing. To overcome the shortcom...
متن کاملA multi-agent simulation framework on small Hadoop cluster
In this paper, we explore the benefits and possibilities about the implementation of multi-agents simulation framework on a Hadoop cloud. Scalability, fault-tolerance and failure-recovery have always been a challenge for a distributed systems application developer. The highly efficient fault tolerant nature of Hadoop, flexibility to include more systems on the fly, efficient load balancing and ...
متن کاملUsing Hadoop as a Grid Storage Element
Hadoop is an open-source data processing framework that includes a scalable, faulttolerant distributed file system, HDFS. Although HDFS was designed to work in conjunction with Hadoop’s job scheduler, we have re-purposed it to serve as a grid storage element by adding GridFTP and SRM servers. We have tested the system thoroughly in order to understand its scalability and fault tolerance. The tu...
متن کاملRedoop: Supporting Recurring Queries in Hadoop
The growing demand for large-scale data analytics ranging from online advertisement placement, log processing, to fraud detection, has led to the design of highly scalable data-intensive computing infrastructures such as the Hadoop platform. Recurring queries, repeatedly being executed for long periods of time on rapidly evolving high-volume data, have become a bedrock component in most of thes...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011